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We study the conditional distribution of low-dimensional pro- 

Cn , jections from high-dimensional data, where the conditioning is on 

other low-dimensional projections. To fix ideas, consider a random d- 
vector Z that has a Lebesgue density and that is standardized so that 

Li^ . EZ = and ¥.ZZ' = Id- Moreover, consider two projections defined 

ry^ • by unit-vectors a and /3, namely a response y — a'Z and an explana- 

• I tory variable x = fi'Z. It has long been known that the conditional 

i^H , mean of y given x is approximately linear in x, under some regu- 

C^ ' larity conditions; cf. Hall and Li [Ann. Statist. 21 (1993) 867-889]. 

However, a corresponding result for the conditional variance has not 
been available so far. We here show that the conditional variance of 
y given x is approximately constant in x (again, under some regu- 
larity conditions). These results hold uniformly in a and for most 
^ ' /3's, provided only that the dimension of Z is large. In that sense, we 

^\^ I see that most linear submodels of a high-dimensional overall model 

are approximately correct. Our findings provide new insights in a va- 
riety of modeling scenarios. We discuss several examples, including 
sliced inverse regression, sliced average variance estimation, general- 

\l ■ ized linear models under potential link violation, and sparse linear 

^«— ^ ' modeling. 

m ■ 



1. Introduction. 



1.1. Informal summary. We analyze a situation where a simple model 
cd . is used when the true model is, in fact, much more complex. This situation 

is particularly common with many contemporary datasets where the num- 
ber of potentially important covariates or the number of parameters exceeds 
the sample size; examples in, say, genomics or economics abound. When 
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2 H. LEEB 

facing a large number of potentially important covariates or parameters, 
and a small sample size, the search for simple models is typically motivated 
by either one of two types of assumptions: Parametric assumptions, which 
postulate that the true data-generating process is given by a simple finite- 
dimensional model; and nonparametric assumptions, which postulate that 
the true data-generating process can be approximated, with arbitrary accu- 
racy, by comparatively simple finite-dimensional models. In either case, the 
underlying postulates can be difficult to verify in practice. And this difficulty 
is often further compounded by the relatively small sample size. The results 
that we obtain here provide an alternative justification for the use of simple 
models. We analyze a scenario where most simple submodels are approxi- 
mately correct, provided only that the overall model is sufficiently complex, 
irrespective of whether or not the true data-generating process is given by, 
or can be closely approximated by, a finite-dimensional simple model. This 
is the main conceptual contribution of this paper. 

On a technical level, we extend and refine a method pioneered by Hall 
and Li [11]. In that reference, the authors propose a novel approach for 
studying conditional means of linear projections under weak distributional 
assumptions; cf. Theorem 3.2 of [11]. Our technical core contribution is an 
extension of Hall and Li's approach to also cover conditional variances and 
higher conditional moments, and a more explicit control of error terms that 
allows us to prove strong statements like (1.5) and (1.6), which follow. Note 
that we deal here with the largest singular value of conditional covariance 
matrices of increasing dimension, which turns out to be considerably more 
challenging than handling the norm of the conditional mean vectors that are 
treated in [11]. 

The paper is organized as follows: We continue this section with a more 
detailed overview of our findings, and with a discussion of some interesting 
consequences. In Section 2, we present our main result, namely Theorem 2.1, 
and give an outline of its proof. The proof is based on five basic steps that 
correspond to five propositions that are also given in Section 2. The (more 
technical) proofs of these propositions are relegated to the supplementary 
material [13]. 

1.2. Overview of results. Consider a random c?- vector Z that has a Lebes- 
gue density and that is standardized so that EZ = and EZZ' = Id- Through- 
out, we will study projections of Z of the form a' Z and (3' Z for unit d- vectors 
a and /3. The conditional mean of a' Z given (3' Z = x will be denoted by 
K[a'Z\\l3'Z = x]; other conditional expectations are defined similarly.^ Our 



^There is a measurable function (j:R— >R so that E[a'Z||/3'Z] = g[f5' Z) holds, and we 
write E[a'Z||/3'Z = x] for g{x)\ the existence of <; is guaranteed by, say, Theorem 4.2.8 of [7]. 
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main results are concerned with the conditional mean and with the condi- 
tional variance of cJ Z given that /3'Z = x. To introduce these, consider the 
following two conditions: The vector /3 is such that: 

(i) for each a, the conditional mean of (J Z given /3'Z = x is linear in 

(ii) for each a, the conditional variance of oi Z given /3'Z = j; is constant 

in X G M. 

Suppose that a is an unknown parameter and that one can observe y and 
X given by y = (J Z and x = ii' Z ^ respectively. If /3 is such that both (i) and 
(ii) hold, then y can be decomposed into the sum of a linear function of 
X and a remainder-term, or error-term, whose conditional mean given x is 
zero and whose conditional variance given x is constant; in other words, the 
model 

y = 7X + u 

applies, where E[u||3;] = and Var[n||x] = o"^, and where 7 E M and o"^ > 
are unknown parameters that are given by 7 = a'/3 and a^ = \ — (a'/3)^, 
respectively. (Indeed, (i) implies that E[y||x] = E[a'Z||/3'Z] = /i + 7(/3'Z) for 
some real constants /i and 7. It now follows from EZ = that /i = 0, and 
EZZ' = Ifi implies that 7 = a'/3. Moreover, (ii) entails that Var[a'Z||/3'Z] = 
0"^ for some constant o"^; hence 1 = Var(y) = Var(a'Z) =E[Var[a'Z||/3'Z]] + 
E[(E[a'Z||/3'Z])2] = ^2 + (a'/3)2, so that a'^ is given by a^ = 1 - (a'/3)2.) 
These observations continue to hold also if the vectors a and /3 are not 
normalized to unit length, mutatis mutandis. 

Conditions (i) and (ii) are satisfied for each /3 if Z is normally distributed, 
that is, Z ~ iV(0, I^). But besides the Gaussian law, the class of distributions 
that satisfy (i) and (ii) for each /3 appears to be quite small: Indeed, if Z 
satisfies (i) for each /?, then the law of Z is spherically symmetric [9]. And if, 
in addition, also (ii) holds for some /3, then Z is Gaussian [1], Theorem 4.1.4. 

Under comparatively mild conditions on the distribution of Z, we here 
show that both conditions (i) and (ii) are approximately satisfied for most 
unit- vectors /3, namely for a set of unit- vectors /3 whose size, as measured by 
the uniform distribution on the unit-sphere in W^, goes to one as d —t- 00. To 
state this more formally, we first describe two preliminary results, namely 
(1.3) and (1.4), which follow, and then extend these to our main results in 
(1.5) and (1.6) below. 

To introduce the two preliminary results mentioned earlier, we note that 
(i) and (ii) together are equivalent to the requirement that K[Z\\(3'Z = x] = 
j3x and E[ZZ'||/3'Z = x\ = Id + (x^ - l)/3/3' hold for each x e M. (In other 



When we say that 'E[a' Z\\I3' Z — x] is linear in x [as in condition (i), which follows], we 
mean that g{x) can be chosen to be linear. Similar considerations apply, mutatis mutandis, 
to expressions like E[Z\\/3' Z = x], E,[{a' Zf\\l3' Z = x] or E[ZZ'\\I3'Z = x]. 
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words, the first two moments of the conditional distribution coincide with 
what they would be in the Gaussian case.) From this, it is easy to see that 
(i) and (ii) are also equivalent to the requirement that both 

(1.1) \\E[Z\\I3' Z = x]f - x^ = and 

(1.2) \\E[ZZ'\\(3'Z = x]- {h + {x^ - l)/3/3')ll = 

hold for each x G M. Note that we use the notation || • || to denote both the 
Euclidean norm of vectors as in (1.1) and the operator norm of matrices as 
in (1.2). The left-hand side of (1.1) can be written as 

||E[Z||/5'Z = x] f - x^ = sup|E[a'Z||/3'Z = x] - a'/3x|^ 

with the supremum taken over all unit- vectors a G M , as is elementary 
to verify.^ Hence, the left-hand side of (1.1) is always nonnegative and 
can be interpreted as the worst-case deviation of the regression function 
Wya' Z\\I3' Z = x] from a linear function at x. The left-hand side of (1.2) can 
be interpreted in a similar fashion. For fixed x S M, the condition (1.1) is 
approximately satisfied for most /3's if d is large, under the assumptions of 
Theorem 3.2 in [11] [see also equation (1.5) in that reference], in the sense 
that 

(1.3) v{p e M'^ : mZWP'Z = x] 11^ - x^ > e} '^-=^ 

for each fixed x E M and for each e > 0, where v denotes the uniform dis- 
tribution on the unit-sphere in M*^. We here show that condition (1.2) is 
similarly approximately satisfied for most /3's if d is large, in the sense that 

(1.4) v{P G M"^ : ||E[ZZ'||/3'Z = x] - {h + (x^ - l)/3/3')|| > e} '^-=^ 

for each x G M and for each e > under the assumptions of Theorem 2.1 in 
Section 2. 

So far, we have seen for fixed x G M and for most /3's that (1.1) and (1.2) 
are approximately satisfied, in the sense that (1.3) and (1.4) hold under some 
conditions. Our main result is that (1.1) and (1.2) are approximately satisfied 
for most /3's and for most x's: Under the assumptions of Theorem 2.1, there 

are Borel subsets Bd of M satisfying v{Bd) — > 1 so that 

(1.5) supP(||E[Z||/3'Z]||2-(/3'Z)2>e)'^-=^?0 and 

(1.6) sup nmZZ'W'Z] - {h + {{P'zf - l)/3/3')|| > e) "^ 
hold for each e > 0. 



^ This easily follows from the fact that W\Z\\p' Z\ (resp., I3P' Z) is the orthogonal pro- 
jection of Z into the space of all measurable (resp., linear) functions of P' Z in Z/2(P), and 
from the relation between the unconditional and the conditional variance. 
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Following a referee's suggestion, we now compare our findings to the work 
of Diaconis and Freedman [5] , which is an important precursor to the results 
of [11] and hence, a fortiori, also to the results in this paper; see also the 
discussion surrounding the displays (1.7)-(1.8) in [11]. (Moreover, the recent 
work of Diimbgen and Zerial [8] should be mentioned here, where several 
extensions and generalizations of the results of [5] are provided.) Under the 
assumptions of Theorem 2.1, Proposition 5.2 of [5] entails, for large d, that 
the (bivariate) joint distribution of a'Z and 13' Z is approximately normal, 
with zero means, unit variances, and covariance a'/3, for most pairs of unit- 
vectors a and f3 (in the sense of weak convergence in probability with respect 
to the product measure v0v as d^ oo). Because the normal distribution has 
linear conditional means and constant conditional variances, this suggests, 
but does not prove, that 

(1.7) P(|E[a'Z||/3'Z] - a'pp'zf >e)'^^0 

for each e > and for most pairs of unit-vectors a and /3 in M'^ (in the 
sense of convergence in probability as a function of a and /3 with respect to 
v<Siv).li a is treated as an unknown parameter, and if the observations a'Z 
and f3'Z are treated as response and explanatory variable, respectively, then 
(1.7) entails, for most a's and (3 's, that the response can be approximated 
by a linear function of the explanatory variable plus an error term with zero 
mean conditional on f3'Z, provided that d is large. The approximating linear 
function is a'P{(3'Z). But for large d, we also have a'P ~ for most a's and 
/3's (with respect to v ^v). The statement in (1.5), on the other hand, is 
equivalent to 



(1.8) p(sup|E[a Z||/3'Z] - a'/3/3'Z|^ > e^ 



^0 



for each e > (in probability as a function of /3 with respect to v); cf. the 
discussion following (1.2). The statement in (1.8) is obviously much stronger 
than that in (1.7). And it guarantees that the conditional mean of a'Z is 
approximately linear in P'Z, for all a 's and for most /3 's; this includes, in 
particular, the statistically interesting case where a is parallel, or close to 
parallel, to /3. Finally, as already observed in [11], "Diaconis and Freedman's 
result does not provide clues as to whether [statements like (1.8)] might 
be true or false." Similar observations also apply to conditional variances, 
mutatis mutandis. 

1.3. Discussion. If the left-hand sides of (1.5) and (1.6) are both small, 
and if /3 G B^, then the simple linear model, where the response a'Z is ex- 
plained by a linear function of the explanatory variable /3'Z plus an error 
that has zero mean and constant variance given fS'Z, is approximately cor- 
rect, irrespective of the unit-vector a. Here, "approximately correct" means 
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that the expressions on the left-hand sides of (1.1) and (1.2) are at most 
£ for a range of values x that contains the explanatory variable (3' Z with 
high probability. Under the conditions of Theorem 2.1, a sufficiently large 
dimension is enough to guarantee that B^ is large and that the left-hand 
sides of (1.5) and (1.6) are small. ^ 

The statistical impact of our results is most pronounced in situations 
where the sample size is small and the dimension is large. Assume that 
Theorem 2.1 applies, and consider a collection of n independent copies of 
the pair {a' Z, f3'Z) that we denote by (a'Zi,/3'Zi), i = l, . . . ,n, with (3 G B^. 
If d is large and n is comparatively small, so that the left-hand sides of 
both (1.5) and (1.6) are still small even when multiplied by n, then the 
simple linear model discussed in the preceding paragraph can also be used 
to approximately describe the relation between a'Zi and f3'Zi for each i = 
1, . . . , n, irrespective of a. 

We stress that additional data may give reason to dismiss the simple lin- 
ear model considered in the preceding paragraphs in favor of a more complex 
one, because the error suffered from using a model that is only approximately 
correct will typically become apparent if n increases to a value that is no 
longer sufficiently small relative to d. This is in line with R. A. Fisher's 1922 
observation that "more or less elaborate forms [of models] will be suitable 
according to the volume of data;" cf. [10]. And we stress that our results 
cannot guarantee that a given simple model, like that discussed in the pre- 
ceding paragraphs, is correct. But we can guarantee, under the assumptions 
of Theorem 2.1, that most simple models are approximately correct, in the 
sense that v{B(i) is large and in the sense that the left-hand sides of (1.5) 
and (1.6) are small, provided only that d is sufficiently large. This also un- 
derscores the need for critical examination of the data and of the model fit, 
irrespective of whether or not d is large. To this end, a very useful diagnostic 
tool is introduced by Li in [18], namely a method to estimate, for a given 
unit- vector /3, that unit-vector a for which the conditional mean of a' Z 
given P' Z is most nonlinear in P' Z\ see also Section 6.1 in that reference. 

The results obtained in this paper do not suggest that one should abandon 
the search for complex and potentially nonlinear relations in the data. But 
after such complex and/or nonlinear relations have been accounted for, or 
in the case where none such can be found, our results show how the use 
of simple linear models can be justified without imposing strong regularity 
conditions on the true data-generating process. 



^Note that this disentangles the issue of (approximate) model validity and the issue 
of model performance: The model is approximately valid if P £ Bd, irrespective of a; 
the performance of this model, on the other hand, that is, the performance of fi'Z as a 
predictor for a' Z, depends on both /3 and a. Under classical parametric or nonparametric 
assumptions, a simple model that is (approximately) correct typically also performs well. 
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The discussion so far prompts for two extensions of our results that are 
beyond the scope of this paper. The first one is to extend our findings to 
the case of more than one explanatory variable, that is, the case where 
the conditioning is not on (3' Z but on {f3[Z, fSI^Z, . . . , f3' Z) for a collection 
of p mutually orthogonal unit-vectors /3i,...,/3p. In fact, Hall and Li [11] 
sketch an extension of their results to that situation, so that an appropriate 
generalization of (1.3) holds. We will consider a corresponding generalization 
of (1.4) and also of the main results in (1.5) and (1.6) elsewhere. The second 
extension is to provide explicit upper bounds for the expressions on the left- 
hand sides of (1.5) and (1.6) that converge to zero as d — )• oo at a fast rate; 
and to provide an explicit lower bound for v{B(i) that converges to one as 
(i — 7- cxo also at a fast rate. 

1.4. Some consequences. 

1.4.1. SIR, SAVE and related methods. Many modern dimension reduc- 
tion methods, like those based on inverse conditional moments, rely on condi- 
tions like (i) and (ii) in Section 1.2. In particular, first-moment-based meth- 
ods like Sliced Inverse Regression [16] are based on a linear conditional mean 
requirement as in (i). (Besides, this requirement is also used in several impor- 
tant results on generalized least squares under possible link misspecification; 
see, for example, [19] and the references cited therein.) And second-moment- 
based techniques like the Sliced Average Variance Estimator [4], Principal 
Hessian Directions [17], or Directional Regression [15], are based on both a 
linear conditional mean requirement as in (i), and on a constant conditional 
variance requirement as in (ii). Both conditions (i) and (ii) are also used in 
recent works such as [2, 3, 6, 14]. 

Given observations from a potentially rather complex data-generating 
process, the dimension-reduction methods mentioned in the preceding para- 
graph aim at finding a simpler model that also describes the data. To justify 
the dimension reduction, these methods make assumptions to the effect that 
requirements like (i) or (ii) are satisfied, for one particular projection^ namely 
for the projection on the so-called central subspace. Under such assumptions, 
the central subspace or, equivalently, the projection onto it, can be recov- 
ered from the data with good accuracy. But as outlined in the Introduction, 
verifying such assumptions in practice can be hard, particularly in situations 
where the sample size is comparatively small. 

Our results provide an alternative justification for requirements like (i) 
and (ii). In particular, in the setting of Theorem 2.1, we see that both (i) 
and (ii) are approximately satisfied for most projections fi' Z in the sense of 
(1.5) and (1.6), provided only that the underlying dimension is large. For 
the linear conditional mean condition, we stress that the relation (1.3) has 
been derived much earlier in [11]. 
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1.4.2. Sparse linear modeling. Consider the linear model with univariate 
response y and a d- vector of explanatory variables w, that is, 

(1.9) y = e'w + e, 

where ^ G M is unknown, and where the error e has zero mean and constant 
variance conditional on w. We also assume that y and w are square inte- 
grable and centered so that 'Ew = 0. The leading case we have in mind is a 
situation where d, that is, the number of available regressors, is as large as, 
or even much larger than, the sample size. To deal with such situations, it 
is common to assume that the true model (1.9) is equal to, or can be closely 
approximated by, a "sparse" submodel that uses only a few explanatory 
variables, and to use the available data to select and fit a sparse submodel. 
Such sparsity assumptions are clearly restrictive. In the following, we ar- 
gue that the results in this paper provide weaker, and hence less restrictive, 
assumptions that also justify the fitting of sparse submodels. 

For illustration, consider now an extremely sparse model where y is re- 
gressed on just one explanatory variable, say, wi, that is, 

(1.10) y = cwi + e, 

where c G M is unknown, and where e has zero mean and constant variance 
given wi. To consider various possible justifications of the sparse submodel 
(1.10), we first rewrite the overall model (1.9) as 

y = {Oiwi +E[e'^^w^i\\wi]) + {e'^iw^i -E[e'^^w^i\\wi] + e), 

where 0-,^ and w^i are obtained from 9 and w, respectively, by deleting the 
first component. 

One possibility to justify the sparse model (1.10) is to impose the extreme 
sparsity assumption that all coefficients of 9^i are zero, so that 6'_^iW^i = 0. 
Then the relation in the preceding display obviously reduces to y = ^iwi + e, 
and (1.10) applies with c = 9i and e = e. Under this extreme sparsity as- 
sumption we obtain, in particular, that the sparse model (1.10) is equivalent 
to the overall model (1.9) in terms of prediction, because E[2/||u;i] = £[^11-0;]. 
A slightly relaxed sparsity condition is to assume that the coefficients of 0-,i 
are possibly nonzero but otherwise negligible, that is, 9'^i'w^i ^ 0, so that 
the sparse model (1.10) is approximately valid with ck,9i and e ~ e. 

An alternative justification of (1.10) is to impose the assumption that, 
given wi, the conditional mean of 9'^iW^i is linear in wi and the conditional 
variance of 9'^iW^i is constant in wi. In that case, the relation in the preced- 
ing display also reduces to (1.10), but now with c = Cov[9'w.,wi\/\ai[wi] = 
01 + ^^^2^«Cov[u;i,ii;j]/ Var[tt;i], and e = 9'^iW^i — '&[9'^iW^i\\wi\ + e. Un- 
der this alternative assumption, the model (1.10) is valid but typically less 
accurate in terms of prediction than the overall model (1.9), because, typ- 
ically, E[y||'u;i] /E[y||tt;] and hence Var[y||wi] > Var[y||i(;]. As before, these 
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assumptions can be relaxed by requiring that the conditional mean is ap- 
proximately linear and the conditional variance is approximately constant. 

In the preceding two paragraphs, we have considered two types of jus- 
tifications for fitting the submodel (1.10). Type (a): Exact or approximate 
sparsity assumptions. And type (b): Exact or approximate linear conditional 
mean and constant conditional variance assumptions. In practice, verifying 
either of these assumptions for a given submodel can be difficult. This raises 
the question as to which set of assumptions, that is, (a) or (b), is more re- 
strictive. To this end, we first note that (a) obviously implies (b). For the 
more detailed comparison that we give in the following, we assume that the 
law of w is nondegenerate so that w can be written as w; = MZ for a d- vector 
Z satisfying MZ = and E,ZZ' = I^. The dx d matrix M is a square root of 
the variance/covariance matrix of it;, which is nondegenerate by assumption; 
which can be assumed to be symmetric; and which need not be known in 
practice. Then O'^^w^i and wi can be written as 0'^iW^i = a'„Z and wi = b'^Z 
with o; = (0,6l2,...,6'rf)M and 6'„ = (1,0, . . . ,0)M. 

The type (a) condition that 6^i = entails that Oo is equal to zero; the 
collection of 0's that satisfy this condition is the 1-dimensional subset of the 
parameter space W^ that is spanned by M~^6o = (1,0,..., 0)' (this collection 
depends on bo). More generally, for y as in (1.9) and for each vector b, the 
simple model with response y and with explanatory variable b'Z, that is, 
y = c{b'Z) + e, can be justified by the type (a) condition that 9 is parallel to 
M~^b. And, for each vector b, the approximate type (a) condition, that 6 is 
approximately parallel to M~^b, is satisfied if belongs to an appropriately 
small neighborhood of the span of M~^b. 

To study type (b) conditions on the conditional moments of O'^^w^i given 
w\ or, equivalently, on the conditional moments of a'^Z given b'^Z, we may 
replace the vectors Oo and 6o by standardized versions Uo and (3o that have 
length one, for example, Oo = ao/||ao||. (Indeed, the conditional mean of a'^Z 
given b'^Z is linear, or approximately linear, in b'^Z, if and only if the same 
is true for the conditional mean of a'^Z given P'^Z] and a similar statement 
applies for the conditional variances, mutatis mutandis.) If Theorem 2.1 
applies and if d is large, then for most (3 's and uniformly in a, the conditional 
mean of a'Z given P'Z is approximately linear and the conditional variance 
of a'Z given /3'Z is approximately constant, in the sense of (1.5) and (1.6). In 
terms of the original parameter 6, we note that uniformity in a corresponds 
to uniformity in 6* G R'^ \ {0}. 

2. Main result and outline of proof. Our main result is that (1.3)-(1.4) 
and also (1.5)-(1.6) hold, for sets Bd with limd^oo ^iBd) = 1. We will es- 
tablish this under the basic condition that Z has a Lebesgue density and is 
standardized so that EZ = and KZZ' = Id for each d. For the method of 
proof that we employ, we also rely on two technical conditions, which follow. 
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y-liV j^n /^ hi /-/ C /~IT ••«. / / X . I /~l . 



Condition (tl). For fixed A; G N and for each d, set Sk = {Z[Zj/d)\ 



where the Zi 's are i.i.d. copies of Z. 

(a) WehaveK[{y/d\\Sk-Ik\\?''+^]=0{l) asd^oo. Moreover, let H be 
a monomial in the elements of Sk — Ik of degree h <2k. If H has a linear 
factor, then d^'^Eff = o(l). And if H consists only of quadratic factors in 
the elements of Sk — Ik above the diagonal, then d ''^KH = 1 + o(l). 

(b) Consider two monomials G and H of degree g and h, respectively, 
in the elements of Sk — Ik- If G is given by Z'-^Z^Z'^Z-^- • • Z'_^ZgZ' Z\l d^ , 
if H depends at least on those Zi 's with i < g, and if 2 < h < g < k, then 
dmGH = o{l). 

Condition (t2). For fixed A; G N, for each d, and for any orthogonal 
dx d matrix R, the marginal density of the last d — k+1 components of RZ 

-I jry 

is bounded by (^_]^) j^d-k+i jg^ some constant B that does not depend on 
d or R. 

Conditions (tl) and (t2) are always satisfied, for any fixed A;, if the compo- 
nents of Z are independent, with bounded marginal densities and bounded 
marginal moments of sufficiently high order; cf. Example A.l in the supple- 
mentary material [13]. Also, if ¥.[{\/d\\Sk - 411)^''"^^] = 0(1), then condition 
(tl)(a) is satisfied if the elements of Vd{Sk — Ik) jointly converge to a Gaus- 
sian; cf. Example A. 2 in the supplementary material [13]. However, these 
conditions are more general than that and allow, in particular, for situa- 
tions where the components of Z are dependent and/or where the elements 
of \^{Sk — Ik) do not converge in distribution. Also note that both con- 
ditions are orthogonally invariant: If Z satisfies any one of them, then the 
same is true for any orthogonal transformation of Z. 

The first requirement of condition (tl)(a) entails that d ''^KH = 0(1) for 
any monomial H in the elements of Sk — Ik of degree up to 2A: + 1. Con- 
dition (tl)(b) strengthens parts of condition (tl)(a) in the following sense: 
Consider monomials G and H as in condition (tl)(b). If condition (tl)(a) 
is satisfied, then EGH = o{d~^^~^ '''^) (because GH is a monomial of de- 
gree g + h that has a linear factor). Condition (tl)(b) requires that KGH 
converges to zero at the faster rate o{d~^). Condition (t2) ensures that the 
distribution of Z is not "too concentrated" in certain directions, and is used 
together with (tl)(a) to guarantee uniform integrability of d/Z' Z and re- 
lated quantities (see Proposition E.l in the supplementary material [13]). 
Also, our conditions should be compared to those used in [11]."^ 

We can now state the main result of this paper. 



* The results in [11] are stated under high-level assumptions that are less specific but 
harder to verify; see, for example, conditions (3.21), (3.28) and (3.29) of Theorem 3.2 in 
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Theorem 2.1. For each d, consider a random d-vector Z that has a 
Lehesgue density and that is standardized such that E,Z = and 'EZZ' = I^. 
If conditions (tl)(a) and (t2) are satisfied with k = 2, then there are Borel 
sets Bd C M" satisfying l[m.d-).oov{Bd) = 1, such that (1.5) holds for each 
£ > 0. If condition (tl) and condition (t2) are satisfied with k = 4, then 
the sets B^ can be chosen so that also (1.6) holds for each e > 0. [More- 
over, for each x G M and each e > 0, the relation (1.3) obtains under condi- 
tions (tl)(a) and (t2) with k = 2, and the relation (1-4) holds under condi- 
tions (tl) and (t2) with k = 4.] 

In the remainder of this section, we give an outhne of the proof of Theo- 
rem 2.1. The proof is comprised of five main steps corresponding to the five 
propositions that follow. 

As the first step, it will be convenient to replace the usual reference mea- 
sure on W^, that is, Lebesgue measure, by the d-variate standard Gaussian 
measure, that is, A^(0,ld). The effect of this change of measure on condi- 
tional densities and on conditional expectations involving Z is described by 
the next result. 

Proposition 2.2. Fix d>l, and consider a random d-vector Z with 
Lebesgue density f . Let V ~ A^(0,ld), and write (j){-) for the Lebesgue density 
ofV. Moreover, for a fixed unit-vector P E M'^ and for each x G M, set W^'^ = 
x/3 + (Id — f3f3')V . Then the function h{-\/3) defined by 



h{x\(3)=K 



f{W' 



forx €M is a density of (3'Z with respect to the univariate standard Gaussian 
measure [i.e., h{x\l3)(j)i{x) is a Lebesgue density of j3' Z if (pi denotes the 
N [0,1) -density]. Moreover, i/^:M'^— t-M is such that '^{Z) is integrable, 
then a conditional expectation E[^(Z)||/3'Z = x] of^{Z) given f3' Z satisfies 



E[*(Z)||/3'Z = x]/i(x|/3)=E 
whenever x S M is such that h[x\l3) < oo. 



^ V(w^"i^) 



This result allows us, for fixed x G M, to study the marginal density of 
/3'Z at x as well as conditional expectations involving Z given /3'Z = x, by 
considering unconditional means involving the random variable T^^''^, which 
has a N{xf3,I(i — /3/3')-distribution. 



that paper. The only specific example that is actually shown in [11] to satisfy all three of 
these high-level conditions is the normal distribution; cf. Example 4.2 and Remark 4.2 in 
that paper. 
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Now, in order to derive (1.3), we follow [11] and use the following ar- 
gument (which can be traced back to Hoeffding [12]; see also [8]): Set 
/Xj-m = K[Z\\I3'Z = x], and let 6 be a random vector in M with distribu- 
tion V, that is, such that b is uniformly distributed on the unit-sphere, that 
is, on the set of unit-vectors in W^. Then (1.3) is equivalent to the statement 



that ||/U^|h|| - 
will follow if 



X converges to zero in probability as a function of b, and this 



E[ 



If^xlbl 



x'^)h'^{x\b)] and E[{h{x\b) - I) 



both converge to zero as d— )• oo, where the expectations are taken with re- 
spect to b. [Note that fi^lb ^^^ h(x\b) are measurable in view of Corollary B.2. 
Also note that both integrands in the preceding display are nonnegative.] 
We now compute E[(/i(x|6) — 1) 



as 



(2.1) 



E 



fiWi] 



2E 



:E 



f{W2 



(P{Wi) (l){W2) 



fiWi 



2E 



HWi 



+ lv{dl3) 



+ 1, 



where the Wj's are defined as Wi = xb+ {Id — bb')Vi, i = 1,2, with Vi and 
V2 i.i.d. N[0,ld) and independent of b. Note that Wi and W2 are dependent 
because both share the same random unit-vector b. If (2.1) converges to 
zero or, equivalently, if h(x\b) — ?• 1 in L'^{b) as d — )• 00, then h{x\b) < 00 
with probability one for sufficiently large d. For such d, we see that the 
second statement of Proposition 2.2 applies for u-almost all /3, such that 
ll^[(ll/^a:|bll^ ~ x^)/i^(x|6)] Can be written as 



(2.2) 



E 



iWiW2 - x') 



2jiWl)f{W2) 



(t>{Wi) (^{W2) 



by arguing as in the derivation of (2.1), as is easy to see. With this, we 
see that (1.3) holds if both (2.1) and (2.2) go to zero as d— t-oo. And if 
(2.1) and (2.2) converge to zero uniformly in x over compact subsets of M, 
that is, if the suprema of (2.1) and (2.2) over x satisfying |2;| < M converge 
to zero as (i — )• 00 for each M > 0, then it is not difficult to also derive 
(1.5) by employing standard arguments; for details, see Lemma B.4(i) in 
the supplementary material [13]. 

To establish (1.4), we employ a similar strategy and write A^i^ as short- 
hand for the d X d matrix A^|^ = E[ZZ'\\I3'Z = x]- {Id + {x^ - l)(3(3'). With 
b again uniform on the unit-sphere, the goal is to show convergence of the 



largest singular value 
this follows if 



Aa,|b)) to zero in probability with respect to b. But 



trace A, 
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as (i — 7- oo for some even integer k (where, again, Corollary B.2 guarantees 
measurability) . This, and hence also (1.4), will follow if 

E[traceA^|f,/i^X^|fe)] 

converges to zero as d — t- oo for some even integer k and if, in addition, also 
(2.1) converges to zero. (We shall find, at the end of the section, that the 
expression in the preceding display converges to zero for fc = 4 but typically 
not for k = 2.) To analyze the expectation in the preceding display, define 
the function A^^^iz) as A^n^iz) = zz' - {Id + (x^ - l)/3/3') for z G M'^. We 
now argue as in the preceding paragraph: Assume that (2.1) converges to 
zero, and assume that d is sufficiently large so that h{x\f3) < oo for u-almost 
all /3. For such d, use Proposition 2.2 to see that E[trace A^|^/i'^(3;|6)] equals 



E 



trace A^lfe(W^i) 



■^M^k 



(/>(VFi) <P{Wk) 



where, similarly to before, Wi = xb+ {Id — bb')Vi with the Vi, 1 < i < fc, i.i.d. 
N{0,ld) independent of b. Instead of computing the trace in the preceding 
display directly, we find it convenient to break it into smaller, more man- 
ageable, pieces. Indeed, we find that the expression in the preceding display 
can be written as the weighted sum of the terms 



(2.3) 






{-lyE 



{W[W2---W'Wi-d + l-x 



2j 



f{Wi) f{Wk) 



E 



n^.-i+i^-+2---w^._iMo, 



X 



2{j,n-m) 



\i=l 



'<P{Wi) <P{Wk) 
\ f{Wi) f{Wk[ 

U{Wi)"' <p{Wk: 



for m > 1 and indices jo, . . .,jm satisfying jo = 0, jm < k, and jj_i + 1 < jj 
whenever <i <m. In addition, we find that the weights in this weighted 
sum depend only on k and on x, and that the weights are continuous in x € M; 
cf. Lemma B.3 for details. [Note that, in (2.3), we write W{W2 ■ ■ ■ W'^Wi as 

shorthand for trace 11^=1 ^i^l^ ^^id we also use notation like W[W2 ■ ■ ■ W'.^Wj 
as shorthand for 11^=1 ^i^i+i-] Hence, (1.4) holds if the expression in (2.1) 
and those in (2.3) go to zero as d — t- oo, the latter for some even integer /c, 
and for any m and Jo,---,jm as indicated. Moreover, (1.6) holds provided 
that the expressions in (2.1) and (2.3) all converge to zero uniformly in x 
over compact subsets of M; details are given in Lemma B.4(ii) in the sup- 
plementary material [13]. 

To understand the large-d-behavior of the quantities in (2.1), (2.2), and 
(2.3), we need to understand the joint distribution of the VF/s, which is 
described by the next result. 
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Proposition 2.3. For d and k satisfying 1 < k < d, the joint distribu- 
tion ofWi,...,Wk has a density with respect to Lebesgue measure that we 
denote by ip^iwi, . . . ,Wk), and this density satisfies 

{d-k~2)/2 



Mwi,-..,Wk) _{d/2)-^I^T{d/2 

(l){wi) ■ ■ ■ (f>{Wk) 



T{{d-k)/2) "^^^"^^ 



1/2 



l-—i'S 
d 



.{k/2)x' 



if Sk is invertible with x'^l'Sj^ l <d, and ipx{wi, . . . , w^) = otherwise, where 
Sfc = {w[wj/d)^ A^^ denotes the k x k matrix of scaled inner products of the 
Wi 's, and t = (1, . . . , 1)' denotes an appropriate vector of ones. 

Using Proposition 2.3, we can rewrite the quantities of interest in (2.1), 
(2.2), and (2.3) as follows: For example, (2.2) equals 



(2.4) 



{w[w2 — X 
{Z[Z2 



f{wi)f{w2) 
(j){wi) (J){W2) 

2. ipx{wi,W2 



W2-X ) 



ipx{wi,W2)dwidW2 

f{wi)f{w2)dwidw2 



<t){wi)(t>{w2) 



:E 



2^ ^x{Zl,Z2) 

"^ ^HZi)HZ2) 



where Zi and Z2 are i.i.d. copies of Z. In a similar fashion, the quantities 
in (2.3) can be written as 



(-1)% 



{Z[Z2 



■Z'Zi 



d+1 



(2.5) 



n ^j.-i+i%-i+2 • • • Zj^-iZj, 



\i=l 




for Zi, i = l,. . . ,k, i.i.d. as Z. And finally (2.1) reduces to 



(2.6) 



E 



^x{Zl,Z2) 

HZimz2) 



■2E 



<fx{Zl) 

^{Zi) 



+ 1. 



Recall that our goal is to show that the expressions in (2.4)-(2.6) converge 
to zero as d —7- 00, uniformly in x over compact subsets of M. We show, in fact, 
a slightly stronger statement, motivated by the obvious conjecture that the 
expected value of density ratios in (2.4)-(2.6), like ipx{Zi, Z2) / {4>{Zi)(j)(Z2)) , 
converges to one, and also by the observation that the expression in (2.4) is 
a special case of the second expression in (2.5) with k replaced by 2. For an 
even integer k and for each I = 1, . . . ,k, for each m>0 and for any indices 
jo, ■■■,jm that satisfy jo = 0, jm < I, and ji^i + 1 < ji whenever 0<i<m, 
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consider the expressions 



(2.7) E 



n ^i-i+i^j»-i+2 • • • Z'j^-iZj, 



\i=l 



ip,j:{Zi,...,Zi) 

HZi)---cPiZi) 



and also the expression 



(2i 






-l^E 



[Z[Z2---Z'^Zi-d) 



^x(Z\,...,Zh) 
</.(Zi)-</,(Zfe) 



X 



(1 



2{jm-m) 



X 



2\k 



Convergence to zero of the expressions in (2.7) corresponding to A; = 2 (uni- 
formly over compacts in x) entails convergence to zero of (2.4) and (2.6) 
(uniformly over compacts in x). And if both (2.7) and (2.8) converge to zero 
for some even integer k (uniformly over compacts), then the expressions in 
(2.5) corresponding to that same k also converge to zero (uniformly over 
compacts). [Convergence to zero of (2.4) follows from convergence to zero of 
(2.7) with m = 1, jm = 2 and I = 2 together with convergence to zero of (2.7) 
with 771 = and / = 2. Convergence to zero of (2.6) follows from convergence 
to zero of (2.7) with m = and I = 1 together with convergence to zero 
of (2.7) with m = and / = 2. For the first expression in (2.5), convergence 
to zero follows from convergence to zero of (2.7) with tti = and / = k and 
from convergence to zero of (2.8), in view of the binomial theorem. Simi- 
larly, for the second expression in (2.5), convergence to zero follows from 
convergence to zero of (2.7) and of convergence to zero of the special case 
of (2.7) where m = and / = k.] 

The expressions in (2.7)-(2.8) both involve expected values of a polyno- 
mial in Z-Zj for some pairs (i, j), multiplied by ipx{Zi, . . . , Zi)/{(j){Zi) ■ ■ ■ (p{Zi)), 
I = 1, . . . ,k.To proceed, we need the polynomial approximation to <px{Zi, . . . , 
Zi)/{(j){Zi) ■ ■ ■(j){Zi)) that is provided by the next result. 



Proposition 2.4. Fix M > and x satisfying \x\ < M. Moreover, con- 
sider integers k and d that satisfy k>l and d > max{3/i;, 2{k + 1)M^}, and 
d-vectors wi, . . . ,Wk that are such that the k x k matrix Sk = {w[wj/d)^ --^ 
satisfies jjS'fe — /fc|| < 1/(2A;). Then ipx{wi,. . . ,Wk) is such that 



fx{wi, 



,Wk) 



MSk-h) + ^, 



<p{wi) ■ ■ ■ <t){Wk) 

where ipx{Sk — Ik) is a polynomial of degree up to k in the elements of 
Sk — Ik- The coefficients of the polynomial '0x(') depend on k, x and d, 
and are hounded, in absolute value and uniformly in x £ [—M,M], by a 
constant C , and A satisfies |A| < -D||5fc — /fcll ; for some constants C = 
C{k,M) and D = D(k,M) that depend only on k and on M . Moreover, both 
i^xiSk — Ik) o,nd A are invariant under permutations of the Wi 's so that, 
in particular, ipx{Sk — Ik) is unchanged when Sk is replaced by the matrix 
{w'^(i)WTrfj\/d)^--^ for any permutation it of k elements. [The coefficients of 
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tpxi') CLud the bounds C and D can be obtained explicitly upon inspection of 
the proof.] 

When studying the expected values in (2.7)-(2.8), Proposition 2.4 sug- 
gests that the density ratio ipx{Zi, . . . , Zk)/{4>{Zi) ■ ■ ■ (j){Zk)) can be approxi- 
mated by the polynomial il^xiSk — Ik)- The resulting approximations to (2.7) 
and (2.8) are 



(2.9) E 



and 



n^-.- 



i^ji 



1+2 






-1% 



\MSi-ii) 



\i=l 



2{j,n-m) 



(2.10) ^(^^)(-l)^E[(Z;Z2. 



Z'^Z, 



d)MSk-ik)]-{i-xy 



respectively. For these approximations to be useful, we need to show that 
the difference of (2.7) and (2.9), and also the difference of (2.8) and (2.10), 
converges to zero as d — )■ oo, uniformly in x over compact subsets of M. The 
technical difficulty here is that expressions like, for example, Z[Z2- ■ ■ Z'-Z\ — 
d in (2.10) have zero mean but do not converge to zero in probability. To 
deal with this, we rely on conditions (tl)(a) and (t2). 

Proposition 2.5. For each d, consider a random d-vector Z that has 
a Lebesgue density, that is standardized such that EZ = and MZZ' = Id, 
and that satisfies conditions (tl)(a) and (t2) for some fixed integer k. Let 
H(Sk — Ik) be a (fixed) monomial in the elements of Sk — Ik whose degree, 
denoted by deg{H), satisfies < deg{H) < k. Then 

ipx{Zi,. . .,Zk) 



E 



^{fc+dcg{//))/2|^(^^ 



i^x{Sk 



^'' (t){Zi)---(t){Zk 
converges to zero as d^ oo, uniformly in x over compact subsets of 



If Proposition 2.5 applies, then it is not difficult to see that the difference 
between (2.7) and (2.9), and also the difference between (2.8) and (2.10), 
converges to zero, uniformly in x over compact subsets of M, as required. 
For example, consider the difference of (2.8) and (2.10), which both involve 
k expected values indexed hy j = 1,. . . ,k, and focus on the difference of 
those expected values corresponding to the index j. Also, recall that k is 
an even integer, so that k > 1. For j = 1, we simply use Proposition 2.5 
with the monomial (5^ — Ik)i,i, and note that \Z[Zi — d\= d\{Sk — -^fc)i,i| < 
^(fc+i)/2|(^^ - /fc)i,i|. For j >'l, we first write \Z[Z2 ■ ■ ■ Z'-Zx\ as 

d\Sk - Ik)i,2 ■■■{Sk- Ik)j,i\ < d^'+'^^/^Sk - Ik)i,2 ■■■{Sk- Ik)j,i\- 

Now use Proposition 2.5 twice, first with the monomial {Sk — Ik)i,2 ■ ■ ■ {Sk — 
Ik)j,i of degree j < k, and then with the degree-zero monomial, and note that 



CONDITIONAL DISTRIBUTIONS OF LOW-DIMENSIONAL PROJECTIONS 17 

d < dS^^^'i"^ here, to see that the difference of expected values corresponding 
to the index j > 1 also converges to zero, uniformly in x over compact subsets 
of M. The difference of (2.7) and (2.9) is treated similarly. 

To show that the expressions in (2.9) and (2.10) converge to zero, the fol- 
lowing observation will be useful: If the Zj's in (2.9) and (2.10) are replaced 
by independent standard normal vectors Vi (and if Si and S^ are replaced by 
the corresponding Gram matrices of the V^'s), then the resulting expressions 
both converge to zero as d — t- oo, uniformly in x over compact subsets of M. 
To establish convergence to zero of (2.9)-(2.10), uniformly on compacts in 
x, it therefore is sufficient to study the differences between the expressions 
in (2.9)-(2.10) and the same expressions with the Zj's replaced by V^'s that 
are i.i.d. standard normal, and to show that these differences converge to 
zero as d — )• oo, uniformly in x over compacts subsets of M. (To derive the 
last observation, we first note that (2.7) and (2.8), with the Zj's replaced 
by the T^'s, are both equal to zero. Indeed, with this replacement, the ex- 
pectation in (2.7) is equal to E[n^i W^ji_i+i^i,-i+2 • • • Wj_{Wj^], because 
(j){vi) ■ ■ ■ (pivi) is the joint density of Vi, ■ ■ ■ ,Vi, and because ipx{wi, . . . ,wi) is 
the joint density of Wi, . . . ,Wi. Conditional on b, the PVj's are conditionally 
i.i.d., with E[Wi\\b] = xb and E[WiVF/||6] =Id + [x^ - 1)66'. In view of this, 
it is elementary to verify that (2.7) with the Zj's replaced by the ViS is 
equal to zero. A similar argument applies, mutatis mutandis, to (2.8). Next, 
we note that Proposition 2.5 applies if Z is replaced by a standard normal 
vector V [that conditions (tl)(a) and (t2) hold when Z is replaced by V fol- 
lows either from Example A.l or upon a simple direct computation]. When 
replacing Z hy V throughout, this entails that (2.9) and (2.10) converge to 
the same limit as (2.7) and (2.8), that is, to zero, uniformly over compacts 
in X.) 

To put this idea to work, expand ipxiSk — Ik) into a weighted sum of 
monomials (in the elements oi Sk — Ik), where the weight of each monomial in 
the sum is given by the coefficient of that monomial in ^x{Sk — Ik)] similarly, 
tpx{Si — h) can also be written as a weighted sum of such monomials for each 
/ < k. We see that the integrand in (2.9) for to = 0, that is, ipx{Si — Ii), can 
be written as the weighted sum of monomials in the elements of Sk — Ik- 
Similarly, for Z as in Theorem 2.1, the integrand in (2.9) for m> can be 
written as the weighted sum of expressions of the form 

(2.11) d'^'=g(^)(G-E[G])i7 

for two monomials G and H in the elements of Sk — Ik of degree k or less, 
where G is given by the monomial 

m 

(2.12) [[{Sk - 4)j,_i+i,j,_i+2 • • • {Sk - Ik)ji-i,ji 

of degree jm — it^, for some ?Ti > and indices jo, • • • ,jm that satisfy jo = 0, 
jm ^ k, and jj_i -|- 1 < jj whenever < i <m. In this weighted expansion 
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of (2.9), note that the weight of (2.11) depends on x, on H (through its 
degrees) and also on d, in such a way that the weight is bounded in x and 
d as long as x is restricted to a compact set (cf. Proposition 2.4). Lastly, 
consider the integrand in (2.10), for Z as in Theorem 2.1. Arguing as before, 
we can write that integrand as the weighted sum of expression of the form 
(2.11), where here G is given by the monomial 

(2.13) (5fc - /fe)i,2 • • • {Sk - Ik)j-i,j{Sk — h)j,i 

of degree j for some j satisfying 1 < j < k. And, again, in this weighted sum, 
the weight of each term depends on x and on H (through its degrees) , and 
that weight is bounded in x and d over compacts in x. (Note that, under 
the assumptions of Theorem 2.1, it is elementary to verify that E,[G] =0 
whenever G is given by (2.12) and also whenever G is given by (2.13) with 
j = 1, and that d'^^g('^)E[G] = d if G is given by (2.13) with j > 1.) 

Proposition 2.6. For each d>l, assume that Z is as in Theorem 2.1. 
Fix an integer k > 1, and let G and H be two (fixed) monomials in the 
elements of Sk — Ik of degree k or less, define G* and H* as G and H , 
respectively, hut with the Z\,...^Zk replaced by i.i.d. standard Gaussian d- 
vectors, and consider 

(2.14) E[d'^''^^^\G -K[G])H]-E[d'^^^'^^*\G* -K[G*])H*]. 

(i) Assume that condition (tl)(a) applies with the integer k as chosen 
here, and that fc < 4. Then E,[H] — E,[H*] and also the expression in (2.14) 
converge to zero as d —t- oo for each monomial G as in (2.12). 

(ii) Assume that condition (tl) is satisfied with the integer k as chosen 
here. Let G he given hy the monomial in (2.13) for some j, 1 < j < k. 
Then the expression in (2.14) converges to zero as d— t-oo, unless either (a) 
H = {Sk- Ik)a,a for some a satisfying l<a<j, (b) H= {Sk - Ik)a,b with 
I <a <b < j, or (c) H = {{Sk — Ik)a,b)'^ with I <a <b <j. In case (a), the 
expression in (2.14) is equal to Vav[Z[Zi]/d — 2; in case {h), it is equal to 
E[{Z[Z2)'^]/d; and in case {c), it equals Yai[{Z[Z2f]/d^ - 2(1 + 3/d). 

To complete the proof of Theorem 2.1, let Z be as in the theorem. We 
first assume that conditions (tl)(a) and (t2) are satisfied with k = 2. The 
relation (1.3) holds for each x and e, if we can show that the expression 
in (2.9) converges to zero for each collection of indices I, m, Jo,...,jm so 
that 1 < Z < 2, 771 > 0, jo = 0, jm < I, and jj_i + 1 < ji for each i = 1,. . . ,m. 
Moreover, (1.5) holds for each e > 0, if convergence zero of (2.9) is uniform 
in X over compacts. To this end, consider the difference of the expression 
in (2.9) and of the same expression with the Zj's replaced by VJ's that 
are i.i.d. standard normal d-vectors. Expanding the polynomial ipx into a 
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weighted sum of monomials, the difference in question can be written as a 
weighted sum of expressions of the form E,[H] — K[H*] in case m = 0, and as 
a weighted sum of expressions of the form (2.14) in case m > 0, where the 
weight is given by the coefficient of the monomial H m.ipx, and where G is of 
the form (2.12) [the monomials H^ G* and H* are as in Proposition 2.6(i)]. 
By Proposition 2.4, we see that the coefficients of ipx are bounded uniformly 
in d and uniformly in x over compacts. And by Proposition 2.6(i), we see 
that expressions of the form E[//] — E[i:/^*] or of the form (2.14) with G as 
in (2.12) all converge to zero. Therefore, (2.9) converges to zero, uniformly 
in X over compacts subsets of M. 

Finally, we assume that conditions (tl) and (t2) hold with k = A. To 
derive (1.4) for fixed x and e, we show that the expressions in (2.9) and 
(2.10) converge to zero [with the indices /, m, jo, ■ ■ ■ ,jm in (2-9) now so that 
1 < ^ < 4, m > 0, jo = 0, jm < ^ and jj_i + 1 < jj for each i = 1,. . . ,m]. And 
(1.6) holds for each e > if convergence in (2.9) and (2.10) is uniform in 
x over compact sets. Now convergence to zero of (2.9) (with k = A here), 
uniformly over compacts, follows by arguing as in the preceding paragraph, 
mutatis mutandis. To deal with (2.10), consider the difference of (2.10) and 
of the same expression with the Zj's replaced by i.i.d. standard Gaussian 
ViS. Again, this can be written as a weighted sum of expressions of the form 
(2.14), with G now as in (2.13), where the weights are bounded uniformly 
in d and uniformly in x over compacts in view of Proposition 2.4. And by 
Proposition 2.6(ii), we see that (2.14) converges to zero except for those i?'s 
that correspond to the cases (a), (b) and (c) in Proposition 2.6(ii). Write 
T-Lai T~ib^ and T-Lc for the collection of all monomials H where the case (a), 
(b), or (c) of Proposition 2.6(ii) occurs, respectively. For each H € T-La-, the 
value of (2.14) is given \ar[Z[Zi\/d — 2 and hence does not depend on H 
in view of Proposition 2.6(ii). And for each H G 7^^, the coefficient of H in 
the polynomial ipxiSk — Ik) also does not depend on H in view of Proposi- 
tion 2.4, because the monomials in Tia can be obtained from each other by 
permutations (or re-labelings) of Zj's. Consider now the difference of (2.10) 
and the same expression with the Zj's replaced by i.i.d. standard Gaussians. 
The combined contribution of the monomials in Tia to that difference is 
given by 
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multiplied by a constant (namely by Yai[Z[Zi]/d— 2 times the common co- 
efficient of the monomials from Tia in ipxiSk — ^k))- Similarly, the monomials 
in Tih contribute 
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multiplied by a constant. And the combined contribution of the monomials 
in %c is also given by the expression in the preceding display multiplied by 
another constant. Because we have k = 4 here, the expressions in the last 
two displays are both equal to zero. Except for the more technical arguments 
that we collect in the supplementary material [13], this concludes the proof 
of Theorem 2.1. 

SUPPLEMENTARY MATERIAL 

Appendix: Proofs for Section 2 (DOI: 10. 1214/12- AOS1081SUPP; .pdf). 
The Appendix contains several more technical arguments that are used in 
Section 2 including, in particular. Examples A.l and A. 2, as well as the 
proofs of Propositions 2.2 through 2.6. 
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